8356165: System.in in jshell replace supplementary characters with ?? #25079

lahodaj · 2025-05-07T06:44:54Z

When reading from System.in in a JShell snippet, JShell first reads the whole line (getting a String), and then converts this characters from this String to bytes on demand. But, it does not convert multi-surrogate code points correctly, it tries to convert each surrogate separately, which cannot work.

The proposal herein is to, when the current character is a high surrogate, peek at the next character, and if it is a low surrogate, convert both the high and low surrogates to bytes together.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8356165: System.in in jshell replace supplementary characters with ?? (Bug - P3)

Reviewers

Christian Stein (@sormuras - Committer)
Adam Sotona (@asotona - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25079/head:pull/25079
$ git checkout pull/25079

Update a local copy of the PR:
$ git checkout pull/25079
$ git pull https://git.openjdk.org/jdk.git pull/25079/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25079

View PR using the GUI difftool:
$ git pr show -t 25079

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25079.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-05-07T06:45:53Z

👋 Welcome back jlahoda! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-05-07T06:46:20Z

@lahodaj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8356165: System.in in jshell replace supplementary characters with ??

Reviewed-by: cstein, asotona

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 241 new commits pushed to the master branch:

6a58af3: 8357143: New test AOTCodeCompressedOopsTest.java fails on platforms without AOT Code Cache support
84a98ab: 8357166: Many AOT tests failed with VM crash
fbc12be: 8349151: Refactor test/java/security/cert/CertificateFactory/slowstream.sh to java test
... and 238 more: https://git.openjdk.org/jdk/compare/9f8fbf292278d995c9fa112d8f97b2375f619537...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-05-07T06:46:54Z

@lahodaj The following label will be automatically applied to this pull request:

kulla

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-05-07T06:49:26Z

Webrevs

sormuras · 2025-05-07T10:31:02Z

src/jdk.jshell/share/classes/jdk/internal/jshell/tool/ConsoleIOContext.java

@@ -977,7 +977,15 @@ public void perform(LineReaderImpl in) throws IOException {
    public synchronized int readUserInput() throws IOException {
        if (pendingBytes == null || pendingBytes.length <= pendingBytesPointer) {
            char userChar = readUserInputChar();
-            pendingBytes = String.valueOf(userChar).getBytes();
+            StringBuilder dataToConvert = new StringBuilder();


Perhaps, add here the comment from the PR description for readers from the future:

[...] when the current character is a high surrogate, peek at the next character, and if it is a low surrogate, convert both the high and low surrogates to bytes together.

The (internal) API used in the implementation doesn't express that on first sight.

Thanks for adding comments.

tats-u · 2025-05-08T14:55:50Z

src/jdk.jshell/share/classes/jdk/internal/jshell/tool/ConsoleIOContext.java

+                if (pendingLine.length() > pendingLinePointer &&
+                    Character.isLowSurrogate(pendingLine.charAt(pendingLinePointer))) {
+                    dataToConvert.append(readUserInputChar());
+                }


How about combining readUserInputChar() and (only when not surrogate pair but just isolated code unit) pendingLinePointer--?

pendingLinePointer-- will be unlikely to be happen for normal inputs other than penetration tests.

tats-u · 2025-05-08T15:21:25Z

test/langtools/jdk/jshell/InputUITest.java

+            inputSink.write("new String(System.in.readNBytes(4))\n\uD83D\uDE03\n");
+            waitOutput(out, "\"\uD83D\uDE03\"");


I think the following is robuster:

- inputSink.write("new String(System.in.readNBytes(4))\n\uD83D\uDE03\n"); - waitOutput(out, "\"\uD83D\uDE03\""); + inputSink.write("new String(System.in.readNBytes(5))\n\uD83D\uDE031\n"); + waitOutput(out, "\"\uD83D\uDE031\"");

tats-u · 2025-05-12T22:54:30Z

I forgot to explain the context:

Increment of the number of input bytes to be read is to assure that a byte or code unit is not read twice or skipped
Providing extra input is to prevent the suspension due to the input starvation (especially for builds without this fix)

lahodaj · 2025-05-13T11:51:01Z

I forgot to explain the context:

* Increment of the number of input bytes to be read is to assure that a byte or code unit is not read twice or skipped

* Providing extra input is to prevent  the suspension due to the input starvation (especially for builds without this fix)

I missed the additional 1 there. But, I think the current code, although more complex, satisfies the constraints as well (we are reading the end-of-line, and checking for it).

tats-u · 2025-05-13T14:30:11Z

I will believe you have considered the difference of the length of EOL in Windows and Unix.

lahodaj · 2025-05-19T09:51:26Z

I will believe you have considered the difference of the length of EOL in Windows and Unix.

Yes, the test handles both Unix and Windows EOL (that's the complicating factor).

tats-u · 2025-05-19T10:11:42Z

I suspect I've been misunderstood the argument for readNBytes in the test case. Sorry to have bothered you.

sormuras

Looks good to me.

asotona

Looks good to me.

tats-u · 2025-05-19T12:11:46Z

src/jdk.jshell/share/classes/jdk/internal/jshell/tool/ConsoleIOContext.java

@@ -977,7 +977,20 @@ public void perform(LineReaderImpl in) throws IOException {
    public synchronized int readUserInput() throws IOException {
        if (pendingBytes == null || pendingBytes.length <= pendingBytesPointer) {
            char userChar = readUserInputChar();
-            pendingBytes = String.valueOf(userChar).getBytes();
+            StringBuilder dataToConvert = new StringBuilder();


FWIW I think we can avoid using StringBuilder (and make the code more RAM-friendly):

char[] dataToConvert = { useChar, '\0' }; // if (...) { // ... // if (...) { // ... dataToConvert[1] = lowSurrogate; // } // ... // } // low-surrogate code unit never be null char pendingBytes = dataToConvert[1] != '\0' ? String.valueOf(dataToConvert) : String.valueOf(dataToConvert[0]);

The next version of .NET is said to be able to allocate such a tiny array to the stack, instead of the heap, but I don't know whether JVM can do the same optimization.

We could do something like that, although I wrote it as I wrote it mostly because that's more clearly correct. Although the current tests probably cover all the cases, so with a bit of work, we probably could eliminate the (explicit) array completely.

Overall, on most places, it is usually not necessary to be too clever - the VM can optimize and eliminate allocations if needed.

I'll leave it up to Adam and/or Christian whether they would prefer a slightly more complex code with less (explicit/visible) allocation.

FWIW, we could do this:
lahodaj@6a07648

Readability of the code is my preference unless the performance is absolutely critical (not this case).

lahodaj@6a07648

Better than my suggestion.

The current code using StringBuilder is not bad because the act to pass a very long string to JShell seems to be something like shooting ourselves in the foot.

You can /integrate the current code with StringBuilder.

lahodaj · 2025-05-20T06:03:30Z

/integrate

openjdk · 2025-05-20T06:04:36Z

Going to push as commit e961b13.
Since your change was applied there have been 250 commits pushed to the master branch:

f8d7f66: 8356998: Convert -Xlog:cds to -Xlog:aot (step 2)
7077535: 8356595: Convert -Xlog:cds to -Xlog:aot (step1)
39d8d10: 8348906: InstanceOfTree#getType doesn't specify when it returns null
... and 247 more: https://git.openjdk.org/jdk/compare/9f8fbf292278d995c9fa112d8f97b2375f619537...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-05-20T06:04:44Z

@lahodaj Pushed as commit e961b13.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

lahodaj and others added 2 commits May 6, 2025 10:14

8356165: System.in in jshell replace supplementary characters with ??

7acff90

Fixing tests.

42052fc

openjdk bot added the rfr Pull request is ready for review label May 7, 2025

openjdk bot added the kulla [email protected] label May 7, 2025

sormuras reviewed May 7, 2025

View reviewed changes

tats-u reviewed May 8, 2025

View reviewed changes

lahodaj added 2 commits May 12, 2025 15:55

Reflecting review feedback

a81e226

(Attempting to) fix the test on Windows.

3f7fd5c

sormuras approved these changes May 19, 2025

View reviewed changes

asotona approved these changes May 19, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label May 19, 2025

tats-u reviewed May 19, 2025

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label May 20, 2025

openjdk bot closed this May 20, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 20, 2025

		inputSink.write("new String(System.in.readNBytes(4))\n\uD83D\uDE03\n");
		waitOutput(out, "\"\uD83D\uDE03\"");

8356165: System.in in jshell replace supplementary characters with ?? #25079

8356165: System.in in jshell replace supplementary characters with ?? #25079

Uh oh!

Conversation

lahodaj commented May 7, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented May 7, 2025

Uh oh!

openjdk bot commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented May 7, 2025

Uh oh!

mlbridge bot commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

sormuras May 7, 2025

Choose a reason for hiding this comment

Uh oh!

sormuras May 19, 2025

Choose a reason for hiding this comment

Uh oh!

tats-u May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tats-u May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tats-u commented May 12, 2025

Uh oh!

lahodaj commented May 13, 2025

Uh oh!

tats-u commented May 13, 2025

Uh oh!

lahodaj commented May 19, 2025

Uh oh!

tats-u commented May 19, 2025

Uh oh!

sormuras left a comment

Choose a reason for hiding this comment

Uh oh!

asotona left a comment

Choose a reason for hiding this comment

Uh oh!

tats-u May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahodaj May 19, 2025

Choose a reason for hiding this comment

Uh oh!

lahodaj May 19, 2025

Choose a reason for hiding this comment

Uh oh!

asotona May 19, 2025

Choose a reason for hiding this comment

Uh oh!

tats-u May 19, 2025

Choose a reason for hiding this comment

Uh oh!

tats-u May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahodaj commented May 20, 2025

Uh oh!

openjdk bot commented May 20, 2025

Uh oh!

openjdk bot commented May 20, 2025

Uh oh!

Uh oh!

lahodaj commented May 7, 2025 •

edited by openjdk bot

Loading

openjdk bot commented May 7, 2025 •

edited

Loading

mlbridge bot commented May 7, 2025 •

edited

Loading

tats-u May 8, 2025 •

edited

Loading

tats-u May 8, 2025 •

edited

Loading

tats-u May 19, 2025 •

edited

Loading

tats-u May 20, 2025 •

edited

Loading